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ABSTRACT 

The epigenetic changes during B-cell development 
relevant to both normal function and hematologic 
malignancy are incompletely understood. We 
examined DNA methylation and RNA expression 
status during early B-cell development by sorting 
multiple replicates of four separate stages of pre-B 
cells derived from normal human fetal bone marrow 
and applied high-dimension DNA methylation 
scanning and expression arrays. Features of 
promoter and gene body DNA methylation were 
strongly correlated with RNA expression in 
multipotent progenitors (MPPs) both in a static 
state and throughout differentiation. As MPPs 
commit to pre-B cells, a predominantly 
demethylating phenotype ensues, with 79% of the 
2966 differentially methylated regions observed 
involving demethylation. Demethylation events 
were more often gene body associated rather than 
promoter associated; predominantly located 
outside of CpG islands; and closely associated 
with EBF1, E2F, PAX5 and other functional tran- 
scription factor (TF) sites related to B-cell develop- 
ment. Such demethylation events were 
accompanied by TF occupancy. After commitment. 



DNA methylation changes appeared to play a smaller 
role in B-cell development. We identified a distinct 
development-dependent demethylation signature 
which has gene expression regulatory properties 
for pre-B cells, and provide a catalog reference for 
the epigenetic changes that occur in pre-B-cell 
leukemia and other B-cell-related diseases. 

INTRODUCTION 

B-lymphopoiesis is a highly coordinated process initiating 
from pluripotent hematopoietic stem cells (HSCs) and 
involves multiple developmental stages. Multipotent pro- 
genitors (MPPs) develop into common lymphoid progeni- 
tors (CLPs), B-progenitor (pro-B), B-precursor (pre-B), 
immature B-cell and mature B-cell stages, each of which 
is characterized by distinct biological features (1,2). The 
key events during early stages include commitment to 
B-lineage and suppression of non-B and stem cell compo- 
nents, which are gradually intensified as the developmen- 
tal stages proceed through hierarchical lineage priming by 
different transcription factors (TPs) (3,4). It is now 
beheved that an orchestrated network of TPs has a fun- 
damental role in B-cell development (5,6). These TPs are 
shown to be indispensible in expressing functioning B-cell 
proteins and altering the genetic landscape, including 
immunoglobulin V(D)J recombination. 



*To whom correspondence should be addressed. Teh +415 514 0577; Fax: +415 502 7411; Email: joe.wiemels(«!ucsf.edu 
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. 
© The Aulhor(s) 2012. Published by Oxford University Press. 

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.Org/licenses/by-nc/3.0/), which permits non-commercial 
reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com. 



11340 Nucleic Acids Research, 2012, Vol. 40, No. 22 



Recent evidence suggests that TF networks are closely 
related to DNA methylation, an important means of gene 
regulation (7,8). During tissue-specific development, DNA 
hypermethylation has been implicated in the stable 
silencing of stem cell-associated TFs, such as P0U5F1 
(also known as OCT4), S0X2 and NANOG, whereas 
DNA demethylation has been implicated in the activation 
of differentiation-associated TFs and their target genes 
(9-11). In the case of B-cell development, the key TF 
genes include SPll (PU.l), BCLllA, TCF3 (E2A), 
IKZFl (Ikaros), EBFl, PAX5 and FOXOl, and their 
target genes include 1L7R, RAGl, RAG2, VPRFBl, 
IGLLl (15), CD79A (mb-1), CD19, BLNK, 1RF4 and 
many others (3-5). Although key B-cell regulators have 
generally been studied on a gene-by-gene basis in mouse 
models, genome-wide studies in human cells are urgently 
needed for comprehensive understanding of the transcrip- 
tional and epigenetic signatures of B-cell development. 

In the present study, we used genome-wide arrays to 
analyse the dynamic DNA methylation and expression 
changes during B-cell development, including the use of 
isolated and purified MPPs, pre-B-I cells, pre-B-II cells 
and immature B-cells from bone marrow isolates. We 
provide a reference methylation and expression map as 
well as a catalog of key DNA methylation changes 
during B-cell development. 

MATERIALS AND METHODS 

Acquisition and sorting of human fetal B-cells 

Human fetal bone marrow (FBM) was obtained from 
elective abortions at San Francisco General Hospital 
with the consent of the women undergoing the surgical 
procedures and with the approval of the University of 
California San Francisco's Committee on Human 
Research. FBM was extracted as previously described 
(12) from specimens from eight foetuses ranging in age 
between 20 weeks and 24 weeks of gestation as estimated 
based on foot length. Four stages of B-cell precursors were 
isolated via flow cytometry sorting (Supplementary Figure 
SI). Early progenitors were isolated based on high levels 
of CD34 protein expression (CD34^^) and a lack of ex- 
pression of the B-cell marker CD 19. This population, 
designated as stage 1 (SI), contains MPPs before hneage 
commitment (predominantly MPPs) but also containing 
CLPs and stem cells. B-cell-committed progenitors were 
isolated based on their expression of CD 19 and CD34 
(CD19^CD34^), which were predominantly pre-B-I cells 
and were designated as stage 2 (S2). Two immature B-cell 
populations were isolated that expressed CD 19, but not 
CD34, and were differentiated based on surface IgM 
(slgM) expression: stage 3 cells (S3) were predominantly 
pre-B-II cells that express sIgM"CD19^; stage 4 cells (S4) 
were predominantly immature B-cells that express 
sIgM+CD19+. 

Infinium DNA methylation profiling and data 
preprocessing 

Genomic DNA was isolated from eight FBM specimens 
using the AllPrep DNA/RNA Mini Kit (Qiagen). 



For each specimen, both DNA and total RNA was ex- 
tracted for four purified cell populations representing Sl- 
S4 (Supplementary Figure SI), giving rise to a total of 32 
isolations. DNA was modified by sodium bisulfite to 
convert unmethylated cytosines to uracil using the EZ 
DNA Methylation kit (Zymo Research, Orange, CA). 
The bisulfite-treated DNA of six specimens (24 samples) 
was amplified and hybridized onto the Illumina 
HumanMethylation450 Beadchip (HM450; Illumina, San 
Diego, CA) according to the manufacturer's specifica- 
tions. Raw data were processed using the GenomeStudio 
software (Illumina), and the average methylation beta 
values (yS) ranging from 0 (unmethylated) to 1 (fully 
methylated) were retrieved for each probe. CpG sites 
with detection /"-values >1.0x 10""^ were removed from 
analysis. Methylation samples with a high proportion 
(>4%) of suboptimal data were eliminated from 
analysis. This resulted in the removal of two samples 
with poor data quahty. 

RNA expression and data preprocessing 

Total RNA was isolated from the four purified cell popu- 
lations, S1-S4 (Supplementary Figure SI), from each of 
the eight specimens as described above. Genome-wide 
gene expression analysis was performed using the 
GeneChip Human Gene 1.0 ST Array (Affymetrix, 
Santa Clara, CA). For each sample, 1 |.ig of high-quality 
total RNA was amplified and labeled (NuGen Ovation) 
and hybridized onto the microarray according to the 
manufacturer's specifications. Data were processed and 
normalized using the Expression Console software 
(Affymetrix) with the standard Robust Multi-array 
Average (RMA) algorithm (13). One sample was 
excluded because of poor data quality, resulting in a 
total of 31 samples from 4 developmental stages in 8 
fetuses. 

ChlP-PCR analysis 

Chromatin immunoprecipitation (ChlP) was done as pre- 
viously described (14) with modifications. Briefly, sorted 
SI and S3 cells (see above) from normal bone marrow 
were crosslinked for 15min at room temperature with 
1% (wt/vol) formaldehyde (sigma, F8775), then lysed 
and sheared to 100-1000 bp in size with ChlP-lT Express 
Enzymatic Shearing Kit (Active Motif. Cat# 53035) ac- 
cording to manufacturer's instruction. Sheared chromatin 
was immunoprecipitated with lOfig anti-EBFl antibody 
(Abnova, H00001879-D01P) or 10 ng anti-BCLllA 
antibody (abeam, Cat# ab 19489) or non-specific IgG 
(Santa Cruz Biotechnology, Cat# sc-2025, performed as 
a negative control) using EZ-ChlP™ Chromatin 
Immunoprecipitation Kit (Millipore, Cat# 17-371) foflow- 
ing the kit instruction manual. The agarose beads were 
washed and the bound chromatin was eluted. The 
ChlPed and input chromatin were incubated overnight 
at 65° C for reversal of crosslinking. Samples were then 
treated with RNase A and proteinase K and purified 
with a Spin Filter from EZ ChlP — a Chromatin 
Immunoprecipitation Kit (Millipore, Cat# 17-371). 
Chromatin was end repaired, ligated to a hnker 
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and amplified by linker-mediated PCR (LM-PCR) using 
GeneAmp High Fidelity PCR System (ABl, Cat# 
4328212). Amplified ChlP and input DNA were purified 
with MinElute Reaction Cleanup Kit (Qiagen, Cat# 
28204) and normalized to ~30ng/|.d. 

Thirty nanograms DNA (pull-down and input DNA, 
normalized for DNA concentration) was analysed by 
SYBR green real-time PCR and the CFX384 Real-Time 
PCR System (BioRad, CA, USA). Primers were designed 
to four predicted EBFl targets and four independent 
BCLllA targets (predicted from ENCODE data). 
Primer sequences and amplicon locations are available 
on request. A two-step PCR cycling method was used: 
Preheat at 95°C for lOmin, and repeat 40 times at 95°C 
for 15 s, 60°C for 1 min. We calculated the fold differences 
in eight candidate loci between ChIP DNA and the input 
DNA using the following expression: Fold 
Difference = l^"^"-'"?"* DNA-cq,ip dna) experiment 
was peformed in quadruplicate, and a separate experiment 
starting from chromatin was repeated once. 

Statistical analysis 

Statistical analysis and data visualization were carried out 
using the R statistical software with Bioconductor 
packages. Methylation or expression levels between two 
groups were compared using moderated /-statistics in a 
mixed-effect model to accommodate correlations 
between samples from the same donor (15). /"-values 
were adjusted with the Benjamini-Hochberg correction 
(16). To investigate the association between differential 
expression and differential methylation, a robust linear 
regression model was used with the differential expression 
level (log2 fold changes) as the outcome and the differen- 
tial methylation level (beta differences) as the predictor, 
adjusted for basehne expression and methylation levels at 
the preceding stage. Confidence intervals and /"-values 
were derived for the regression coefficients using 2000 
bootstrapped samples. Fisher's exact test was used to 
compare the equahty of two binomial probabilities. 
Enrichment for TF motifs was done using the Analysis 
of Motif Enrichment (AME) (17) and MEME Suite 
(18); sequences 100 bp (±50 bp) around significant CpGs 
were retrieved and analysed with core motifs from the 
TRANSFAC Professional (http://www. gene-regulation, 
com) database. £- values below 0.001 were regarded as 
significant. DNA sequences, positional information of 
the NCBI reference genes, the ENCODE chromatin 
immunoprecipitation sequencing (ChlP-Seq) data, and 
CpG island (CGI) annotations were retrieved from the 
UCSC database (http://genome.ucsc.edu/). Chip-Seq en- 
richment analysis was performed by examining whether 
differentially methylated regions (DMRs) are located 
within TF binding sites of the ENCODE Chip-Seq 
database and by comparing their frequencies with those 
of all CpG sites on HM450 (Fisher's exact test). Pathway 
enrichment analysis was done using the Ingenuity 
Pathway Analysis software (Ingenuity Systems, 
Redwood City, CA). 



RESULTS 

Static methylation and expression patterns in MPPs (SI) 

The MPPs (SI) serve as the basehne DNA methylation 
levels before the cells are committed to any lineages. 
At SI, similar to a previous report (19), we observed 
lower methylation levels near the promoter region 
compared to the body of a gene (Supplementary Figures 
S2A and B). Interestingly, while CGIs were in general 
depleted of methylation, we found that CGIs located in 
the body region were more methylated and more variable 
than CGIs sites lying within promoters (Supplementary 
Figures S2B). 

We next compared methylation levels in the SI samples 
with the histone modification ChlP-seq data of the 
lymphoblastoid cell line GM 12878 from the ENCODE 
project (20) as DNA methylation is believed to interact 
with histone modifications to regulate gene expression 
(7,8). There was a strong negative correlation between 
DNA methylation levels and the presence of histone 
marks that target actively transcribed genes. The median 
methylation value was low in H3K4Mel, H3K4Me3 and 
H3K79me2 peaks, which preferentially mark enhancers, 
promoters and transcriptional transition regions, respect- 
ively, whereas the value was high in repressive mark, 
H3K27me3. In addition, Polycomb target genes tended 
to be hypomethylated whereas TETl target genes 
(derived from human homologs of mouse TETl targets) 
tended to be semimethylated (Figure lA) at SI. TET2 
targets would be more appropriate to examine in hemato- 
poietic cells but have not been described. Similar methy- 
lation profiles were also observed in the other stages 
(results not shown). 

Although it has been well-estabhshed that methylation 
in the promoter region is involved in transcriptional 
silencing (21), httle is known about how body methylation 
affects gene expression and whether promoter and body 
methylation regulate expression in concert. To examine 
this, we partitioned promoter and body methylation 
levels into hypo- (yS<0.3), semi- (0.3<j8<0.7) and 
hypermethylated (y6 > 0.7) groups and compared gene 
expression levels according to these different combinations 
in Figure IB. The most highly expressed group was that 
with both promoter and body hypomethylated whereas 
the lowest was with both regions hypermethylated, and 
the mean difference in gene expression between these 
two groups was 2.8-fold (/" < 2.2 x 10~'*, two-sample 
/-test). Interestingly, when the gene body region is 
devoid of CpG sites, the repressive nature of promoter 
methylation is at its most potent with a striking 3.6-fold 
difference in expression between the hypo- and 
hypermethylated promoter groups whereas the same dif- 
ference is 2-fold in genes with body CpG sites (Figure IB, 
right). We found many housekeeping genes such as 
GUKl, CENPB, RPS2, NME2 and UBE2M belong to 
this group of genes lacking body methylation sites. To 
formally model the relationship between expression and 
methylation, a robust linear regression model was fit to 
the expression levels with promoter and body methylation 
and their interactions as predictors, and the regression 
coefficients were —1.55 (promoter, P< 0.001 using 2000 
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Figure 1. Baseline methylation and expression levels at SI. (A) Methylation levels inside and outside of annotation categories, including histone 
modifications, Polycomb target genes and TETl target genes. (B) Association of DNA methylation with gene expression. Median methylation levels 
of CpG sites mapped to the promoter region and the body region of a gene were obtained together with its gene expression level as measured by the 
Affymetrix Gene 1.0 ST Array. Methylation levels were then categorized as hypomethylated, semimethylated and hypermethylated based on the 
following beta cutoffs, respectively: fS <0.3, 0.3 < /i <0.7 and fS>OJ. Genes that have no body-associated CpG sites (mainly small housekeeping 
genes) are plotted at the right side of the figure. The number of genes in each group was labeled at the top of the graph. The most highly expressed 
group was that with both proinoter and body hypomethylated whereas the lowest was with both regions hypermethylated, and the mean difference in 
gene expression between these two groups was 2.8-fold (P<2.2x 10^", two-sample r-test). 



bootstrapping), -0.46 (body, 0.001) and 0.05 (inter- 
action, P = 0.36), respectively. This quantification con- 
trasts the extent to which promoter and body 
methylation controls expression and confirms that 
promoter methylation exerts stronger control over tran- 
scription. But intriguingly, when promoter methylation is 
held constant, increasing body methylation is associated 
with more repressed expression, suggesting a fine-tuning 
mechanism of body methylation for transcription 
regulation. 



Alterations in DNA methylation associated with pre-B-cell 
development reveals a dominant demethylation signature 

After examining the static methylation and expression 
profiles of the lymphoid precursor (SI), we next sought 
to identify DMRs between any two subsequent stages 
(S1->S2, S2->S3 and S3->S4). We chose the CpG sites 
that had Benjamini-Hochberg-adjusted /"-values <0.05 
for the moderated ^-test and yS-difference (A/J)>0.2. 
Although two distinct types of DMRs were observed 
during stage progression, those that had a loss of 
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Figure 2. Differential methylation analysis of B-cell developmental stages (stages 1-4; S1-S4). DMRs between any two subsequent stages were 
identified if the FDR-adjusted P-value for the r-test was <0.05 and the \AfS\ value was >0.2. (A) A volcano plot with maximal -log P and maximal 
|A/i| between any two subsequent stages for each CpG site shows a negative skewness of AfS. (B) Density plots of the methylation level at SI for 
CpG loci that are identified as de-DMRs and de novo DMRs. (C) Scatter plots of methylation levels of any two subsequent stages (SI versus S2, S3 
versus S2 and S4 versus S3). Red and blue dots highlight de novo and de-DMRs, respectively. (D) A hierarchical clustering of the 22 samples using 
the core de-DMRs-classified samples according to their stages. The inset shows boxplots of methylation levels of core de-DMRs. (E) Association of 
core de-DMRs with non-CGI regions and gene body regions. (F) ChlP-PCR analysis of four BCLllA (top) four EBFl (bottom) targets. 
ENCODE-described TF binding sites which were also de-DMRs were pulled down using ChIP techniques in SI and S3 cells. The amount of 
each targeted sequence was compared in the pull-down relative to the input DNA using quantitative PGR, and the fold enrichment in SI and S3 cells 
shown in the figure. The analysis demonstrated that S3 cells harbored between 10-fold (for Ci)22/EBF1 pull-down) and 3073-fold (for NACA2I 
BCLlla pull-down) more sequence bound to the indicated TFs compared to input, in most cases orders of magnitude more than in SI cells. 
Non-specific IgG control pull-down displayed at maximum only a 3.1-fold enrichment of target sequence compared to control (data not shown). The 
experiment was performed in technical quadruplicate and repeated once froin the chromatin, with similar results. 

(continued) 
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Figure 2. Continued. 



Fold-enrichment over input 



methylation, which we designate as de-DMRs, and those 
that obtained more methylation, which we designate as de 
novo DMRs, de-DMRs were conspicuously dominant 
during stage progression. A scatter plot between -logio 
P and A/^ of any two sequential stages showed that sig- 
nificant CpG sites were skewed to negative Afi values 
(Figure 2A). We next assembled the 2987 unique 
de-DMRs along the stage progression and designated 
them as the 'core de-DMRs'; as expected, these loci were 



hkely to be more methylated at SI than de novo DMRs 
(Figure 2B). During S1->S2, de-DMRs (2330 CpG sites; 
1039 genes) outnumbered de novo DMRs (636 CpG sites; 
421 genes) by almost 4 to 1. DMRs between S2->S3, and 
S3->S4 are exclusively de-DMRs, 828 and 139 CpG sites 
(455 and 81 genes), respectively (Figure 2C). Not only did 
de-DMRs far outnumber de novo DMRs loci, there was 
also a significant overlap between these two sets from 
S1->S2 (P<2.2x 10"'''), which suggests that 
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demethylation plays a more significant role during pre-B- 
cell development but de novo methylation likely plays only 
a secondary and accessory role. 

We observed only a small number of DMRs between S3 
and S4, suggesting that these two stages are very similar in 
terms of their methylation profiles. A hierarchical cluster- 
ing of the 22 samples using the 2987 core de-DMRs per- 
fectly classified each sample into its relevant stage, with S3 
and S4 samples closely assigned into the same sub-tree 
(Figure 2D). Interestingly, both the clustering heatmap 
and the boxplots indicate that loss of methylation is a 
continuous event during pre-B-cell development and 
once a CpG site incurs a loss of methylation, it remains 
or becomes even more demethylated throughout subse- 
quent stages (Figure 2D). 

The core de-DMRs in B-cell development predomin- 
antly occurred in the CpG sites that are located in the 
gene body regions and outside of the CGIs, and depleted 
in promoter regions within the CGIs (Figure 2E). Among 
the core de-DMRs, 23% and 42% map to the promoter 
and body regions of a gene, respectively; a 1.8-fold prefer- 
ence for the body regions, in contrast to the background 
frequencies by all the CpG sites on the HM450 array, 
which have roughly equal amount of promoter or body 
CpG sites (36%) and 34%, respectively). Interestingly, 
only 35 among the >1000 genes demethylated from SI to 
S2 acquired both promoter and gene body changes, 
indicating that methylation status in these two regions 
were altered by independent processes. The partiality 
toward CpG sites outside of CGIs is even more striking 
with only 5% of the core de-DMRs located inside of the 
CGIs as opposed to 31% background frequency on the 
array (Figure 2E). 



Using Chip-Seq enrichment data from the ENCODE 
project which combines data from 95 different cell fines 
and 152 transcription-factor (TF) targeting antibodies, 
DMRs were frequently observed in locations associated 
with specific TFs; with specific TF enrichment being stage 
dependent. From SI to S2, EBF binding sites are very sig- 
nificantly enriched for de-DMRs (fold enrichment = 10.94; 
Fisher's exact test P <2.2x 10~'^, Table 1), supporting its 
principal role in B-cell development. On the other hand, 
de novo DMRs were significantly enriched for CEBPB, 
c-Jun and p300. From S2 to S3, de-DMRs were preferen- 
tially enriched with IRF4 and MEF2A binding sequences 
(Table 1). This specific enrichment of TF binding sites in 
B-ceU development DMRs was further confirmed by motif 
enrichment analysis using AME and the TRANSFAC 
motifs. From SI to S2, EBF-related motifs, V$EBF_Q6 
and VSOLFl Ol, were the most significantly associated 
with de-DMRs followed by other ETS and E2A motifs, 
whereas de novo DMRs were significantly enriched for 
PU.l, SPIl, ELFl and SPIB motifs (Supplementary 
Table SI). To test whether the demethylation events at 
TF binding sites was functionally significant, we performed 
ChlP-PCR analysis with two of the most significant TFs, 
EBFl and BCLl 1 A, comparing multipotential cells (SI) to 
committed pre-B ceUs (S3). ChlP-PCR analysis revealed 
that predicted targets were not occupied in SI cells but 
strongly and specifically occupied for each TF during S3 
(Figure 2F). 

Alterations in RNA expression associated with pre-B-cell 
development 

We next analysed the concomitant RNA expression 
changes within the same samples in order to examine 



Table 1. Fold enrichment of DMRs for TF binding sites derived from ENCODE ChlP-Seq data 



TFs Known functions Fold enrichment" 



de novo DMR (S1-S2) de-DMR (S1-S2) de-DMR (S2-S3) de-DMR (S3-S4) 



EBFl 


Early B-cell development 


1.96 


10.94 


8.28 


5.65 


BCLllA 


Late B-cell development. Interact with BCL6 


5.36 


8.48 


16.06 


15.44 


BATE 


Interacts with Jun family proteins 


8.12 


5.84 


8.91 


11.60 


PAX5 


Early B-cell development 


1.13 


4.08 


6.00 


3.46 


MEF2A 


Control cell growth and apoptosis 


2.21 


3.17 


5.88 


6.58 


IRF4 


Late B-cell development. Regulate interferon 


2.95 


3.14 


6.44 


7.00 


TCF12 


Development of B- and T-cells 


0.96 


2.82 


3.66 


5.97 


PU.l 


Myeloid and B-cell development 


4.73 


1.94 


4.18 


7.58 


p300 


Cell proliferation and differentiation 


9.98 


1.65 


2.01 


0.92 


c-Jmi 


Bind to DNA and regulate expression 


6.07 


1.24 


1.48 


1.64 


CEBPB 


Myeloid development 


5.13 


0.46 


0.62 


0.53 


TAFl 


General TF (TFIID) 


0.55 


0.33 


0.37 


0.61 


GABP 


Control of mitochondrial function 


0.91 


0.21 


0.15 


0.33 


Sin3Ak 


Co-repressor interacting with HDACl and MECP2 


0.38 


0.19 


0.28 


0.39 


HEYl 


Neurogenesis and cardiovascular development 


0.54 


0.16 


0.14 


0.08 


IRFl 


Activator of interferon-a and -P 


0.50 


0.11 


0.08 


0.24 


Pol2 


RNA polymerase II, large subunit 


0.23 


0.09 


0.05 


0.00 


E2F1 


Cell cycle control 


0.30 


0.09 


0.08 


0.26 


E2F6 


Cell cycle control 


0.74 


0.08 


0.09 


0.00 


HMGN3 


Control chromatin fiber 


0.80 


0.08 


0.14 


0.28 


CCNT2 


Cell cycle control 


1.10 


0.07 


0.12 


0.24 


E2F4 


Cell cycle control 


0.13 


0.00 


0.00 


0.00 



"Fold enrichments were calculated by dividing the percentage of DMRs overlapping with a specific TF binding sites to the percentage of all loci on HM450 
overlapping with the same TF binding sites. Significant enrichments/depletions (Bonferroni-adjusted Fisher's exact test /"-values < 0.05) are bolded. 
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RNA expression changes associated with pre-B-cell devel- 
opment, and moreover, to distil functional methylation 
changes that had transcriptional consequences. To 
nominate differentially expressed genes (DEGs) between 
any two ensuing stages, we used the thresholds of adjusted 
P-values <0.05 (Benjamini-Hochberg adjustment) for 
moderated /-test and fold-change > 1.5, which resulted in 
a total of 3913 DEGs. Unhke DNA methylation that was 
predominantly one-directional with hypomethylation 
being the driving force, RNA expression was a 
bi-directional process with similar numbers of up- and 
down-regulated genes in total. There was, however, a 
stage dependent preference toward up-regulation or 
down-regulation. During transition from SI to S2, 
up-regulated DEGs were about two times more prevalent 
than down-regulated ones (1996 versus 675 genes); this 
together with the concomitant predominant 
demethylation leading to S2 suggests a possible role of 
methylation-regulated transcriptional changes during the 
initial B-cell lineage commitment. In contrast, during 
S3-S4 progression, down-regulated genes significantly 
outnumbered up-regulated genes (949 versus 364 genes; 
Figure 3 A), this in combination with just 81 genes 
(corresponding to 139 DMRs) that are demethylated 



from S3 suggests that DNA methylation might not play 
a major role in regulating gene expression leading to the 
progression to immature B-cells. As expected, hierarchical 
clustering using these 3913 DEGs accurately classified the 
31 samples into 4 different stages, and S2 and S3 were the 
most similar in expression profiles (Figure 3B). The 
putative B-cell-related TF and surface antigen genes 
such as TCF3, IKZFl, EBFl, PAX5, CD19, CD79A and 
BLNK were up-regulated throughout all stages. Genes 
involved in immunoglobuhn rearrangement or functioning 
as temporary surrogate markers, such as RAGl, RAG2, 
VPREBl and IGLLl, were activated during S2 and S3 
and down-regulated again at S4. The early progenitor 
genes, SOX 2 and SPIl, were gradually down-regulated 
as well as non-B lineage genes such as GFll, GATAl 
and N0TCH3. DNA methyltransferase genes including 
DNMTl, DNMT3A and DNMT3B were progressively 
down-regulated (Supplementary Figure S3). 

Pathway enrichment analysis on the up-regulated and 
down-regulated genes between the three stage progres- 
sions (S1->S2, S2->S3 and S3->S4) identified significant 
pathways specifically in up-regulated genes between SI 
and S2, and S2 and S3, and down-regulated genes 
between S3 and S4. From SI to S2, antigen presentation 
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Figure 3. Differential expression analysis during B-cell development. (A) Scatter plots of gene expression levels of two subsequent stages during 
B-cell development. DEGs(FDR-adjusted /'<0.05 and fold changes >1.5) were highlighted with red and blue dots (up- and down-regulated, 
respectively). (B) A hierarchical clustering of 31 samples based on the expression profiles of the 3913 core DEGs classified each sample into its 
relevant stages. Heatmap colors indicate log-ratios against expression levels at SI. 
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and B-cell receptor (BCR) pathways associated with B-cell 
development were enriched in up-regulated genes, as well 
as polo-Uke kinase (PLK) and G2/M checkpoint 
pathways-associated ceU prohferation. Nuclear factor of 
activated T-cells (NFAT), hepatocyte growth factor 
(HGF), interleukin-3, protein kinase A, estrogen 
receptor, glucocorticoid receptor signahng pathways and 
protein ubiquitination pathways were also modestly 
enriched during these stages. From S2 to S3, 
Phosphoinositide 3-kinase (PI3K) and interferon 
(mediated by JAKl, JAK2 and TYK2) signaling 
pathways were specifically enriched in up-regulated 
genes. During S3->S4 progression, many genes hnked to 
cell cycle, cell division and mismatch repair were 
down-regulated. The associated signaling pathways 
include BRCAl (most significant, P = 1.6 x 10~^^), 
ataxia telangiectasia mutated (ATM), checkpoint kinase 
(CHK), PLK, cyclin and purine/pyrimidine pathways 
(Supplementary Table S2). 

Integrative analysis of metiiylation and expression changes 
during pre-B-cell development 

To investigate whether methylation alterations affect tran- 
scription, we examined the expression changes of genes 
with altered methylation levels during stage progression. 
In most cases, methylation changes did not lead to 
discernable expression differences in associated genes. 
12.4% of the genes with one or more DMRs from SI to 

52 resulted in corresponding transcriptional alterations, 
which represented a highly significant overlap (Fisher's 
exact = 1.9 X 10"*^). Similarly, from S2 to S3, 10% of 
genes with >1 DMRs had concurrent methylation and 
expression changes (Fisher's exact P < 2.2 x 10~'^), 
whereas the corresponding number from S3 to S4 was 
only 2.6% (P = 0.77). Looking among the genes that 
incurred stage-wise expression changes, the percentages 
that were accompanied with methylation changes were 
10.1, 11.9 and 0.25% for stage progression from SI to 
S2, from S2 to S3 and from S3 to S4, respectively. This 
suggests that methylation plays a pivotal role in B-cell 
lineage commitment, but mechanisms other than methy- 
lation also play an important role, especially toward pro- 
gression into immature B-cells. We listed the 224, 60 and 
7 genes in that had undergone concurrent methylation and 
expression changes from SI to S2, from S2 to S3 and from 

53 to S4, respectively, in Supplementary Table S3. 
Given the enrichment of DMRs detected at intragenic 

regions outside of CGIs, we classified all stage-wise DMRs 
into different regulatory subgroups according to their 
relative locations in the gene (Promoter, Body) and to 
CGIs (non-Island, Island, Shelf/Shore) and examined the 
relationship between differential methylation and differen- 
tial expression within each subgroup. If there were 
multiple DMRs mapped to the same gene and same regu- 
latory subgroup, we selected the DMRs with the largest 
absolute methylation changes to represent the unit. In a 
robust hnear regression analysis using the expression 
changes as the outcome and the methylation changes as 
the predictor, adjusted for baseline expression and methy- 
lation levels at the preceding stage, DMRs in aU regions. 



except for body/CGI, showed an inverse correlation with 
expression changes. Interestingly, the strongest associ- 
ation was found in the group of promoter/non-CGI 
(coefficient = -0.61, P = 0.001, n = 350); a Afi of 0.75 
in this region was estimated to correspond to about a 
1.37-fold change of expression levels. Even though the 
majority of the methylation changes (810 genes) were 
detected in body/non-CGl regions, their effects on expres- 
sion were much weaker (coefficient = —0.21, P = 0.002). 
DMRs in body/CGI were small in number but 
intriguingly showed the only positive correlation with ex- 
pression (coefficient = 0.11, P = 0.32, n = 87; Figure 4A). 
In a subset of CpGs and genes, we observed delayed or 
lagged expression changes in later stages after methylation 
changes in prior stages. A total of 48 de-DMRs from SI to 
S2 were associated with up-regulated DEGs from S2 to 
S3, and 56 de-DMRs from SI to S2 were so with those 
from S3 to S4 (Supplementary Table S4). Characteristic 
CpGs include those near the CD19 gene, which are sub- 
stantially demethylated from S 1 to S2, with accompanying 
substantial expression changes in later stages 
(Supplementary Figure S4). 

Among the 350 genes that had one or more DMRs 
detected in the promoter/non-CGI group, 25% (89) 
genes had gene expression changes of at least 1.5-fold. 
Focusing on these 89 genes, we observed that 65 of them 
(74%) had methylation and expression changes occurring 
in inverse correlation (Figure 4B). Methylation and gene 
expression changes of this 65-gene set are hsted in 
Supplementary Table S5. Examining the CpG dinucleo- 
tide density of these 65 genes, we found that a surprisingly 
high percentage (75.8%) do not contain any CGI regions 
in their entire genie region (including promoter region), 
furthermore, 86.4% had no CGI at their promoter 
region (Supplementary Figure S5A). In contrast, only 
27.1% of the transcripts on HM450 do not contain any 
overlapping CGIs in their genie region, and only 34.1% 
have non-CGI promoters (Supplementary Figure S5B). 
While the role of CGIs located in the promoter regions 
in transcriptional silencing is well-recognized, little is 
known about the regulation of the genes whose promoters 
contain no CGIs, our results show that demethylation at 
low-density CpG sites in the promoter regions has a func- 
tional significance in the establishment of pre-B-ceU 
hneage. 

Specific histone modifications have been previously 
shown to be associated with active CGI promoters (22). 
To characterize the chromatin structure associated with 
these 65 genes, we examined the histone modification 
ChlP-seq data of the lymphoblastoid ceU hne GM 12878 
from the ENCODE project. A large majority, 75% and 
66%, of these 65 genes have their non-CGI promoter 
regions marked with active histone modifications 
H3K4mel and H3K4me3, respectively. Interestingly, 
only 37% and 28% of all the non-CGI promoter regions 
on HM450 are marked with these two active histone 
modifications. The percentages are notably higher, at 
50% and 85%, for CGI promoter regions 
(Supplementary Figure S6). Our results indicate that the 
non-CGI promoters associated with expression changes 
display a histone modification profile more similar to 
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Figure 4. Association of RNA differential expression witii DNA differential methylation during B-cell development. (A) Forest plot of the coefficient 
of differential methylation in a robust linear regression analysis using the gene-wise methylation changes as the outcome and the gene-wise expression 
changes as the predictor. Methylation changes were stratified with respect to annotation categories and model fitting was carried out within each 
category. (B) Differential promoter/non-CGI methylation negatively correlates with gene expression. The 89 genes showing both a change in one or 
more DMRs in its promoter/non-CGI region and a change in gene expression {>1. 5-fold) were plotted. (+) and (— ) denote an increase and a 
decrease in expression or methylation, respectively. Numbers above the bars are percentages. 



that of CGI promoters than that of the average non-CGI 
promoters. 

Differences of methylation and expression changes 
in a gene-specific context 

Ahhough some trends were noted in methylation and 
expression changes, the situation differed gene by gene. 
A critical gene for pre-B-cell formation is CD19, whose 
promoter region was deniethylated during gene activation. 
These regions and DMRs exactly coincided with EBF 
binding sites (Supplementary Figure S7A). A second 
example is TCF3 (E2A), which has two distinct TSSs. 
The upstream TSS (TSSl; encoding isoform El 2) was 
hypomethylated at SI and its methylation level remained 
unchanged during B-cell development whereas regions 
around the second TSS encoding isoform E47 underwent 
demethylation (Supplementary Figure S7B). This suggests 
a role of methylation changes in isoform regulation. In the 
case of PAX5, its TSS overlaps with CGIs and these 
regions were hypomethylated at SI. During gene activa- 
tion, the CGI region in the middle of intron 5 became 
hypermethylated (Supplementary Figure S7C). In a 
mouse study, the 5'-part of PAX5 intron 5 was proven 
to have enhancer elements that were affected by EBFl 
binding and were demethylated during B-cell development 
(23). An example of down-regulated gene is NR1H3, a key 
regulator of macrophage function, whose CpG sites in the 
promoter region were subjected to de novo methylation 
whereas CpG sites located in the 3'-part of the gene 
body were demethylated (Supplementary Figure S7D). 



DISCUSSION 

In the present study, we compared the methylation profiles 
of early progenitor cells starting from a point prior to 
hneage commitment (predominantly MPPs and CLPs) 
through pre-B-I, pre-B-II and immature B-cells, and 
have comprehensively cataloged methylation changes 
associated with pre-B-cell development. The dense probe 
placement enabled us to examine methylation changes 
within and outside CGIs and in various locations within 
individual genes to gain insights on loco-regional epigen- 
etic control of human early B-cell development. Our 
analysis created a reference set for those studies 
investigating malignancies originating from B-cell pro- 
genitors which are the most common cancer in children. 
In combination with genome-wide RNA expression 
profiling, we have identified a distinct 
development-dependent demethylation signature which 
has gene expression regulatory properties. Altered DNA 
methylation occurred predominantly within gene bodies 
outside of CGIs, but less frequently at annotated gene 
promoters or with CGIs. 

The canonical view on epigenetic gene regulation is that 
DNA methylation in CGIs in gene promoters has a prin- 
ciple role for repressing gene expression, and the loss of 
methylation in this region is associated with gene activa- 
tion (7,8). Recently, several groups have provided 
evidence hnking gene body methylation and transcription, 
but outstanding questions remain (24). Our data suggest a 
complex mechanism of epigenetic gene regulation during 
pre-B-cell development via DNA methylation. We 
observed that DNA methylation changes more frequently 
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occur at gene body and remote upstream regions than 
promoter regions during early B-cell development 
although those at promoters still possess the most potent 
effect on gene expression. It is notable that at SI, before 
complete lineage commitment, the promoters of the 
majority of the genes are hypomethylated [P < 0.3), espe- 
cially those having CGIs in their promoters 
(Supplementary Figure S2A). Our data indicate that, 
during progression through pre-B-cell stages, these 
promoter regions maintained their liypomethylation 
status whereas alterations in methylation occurred specif- 
ically to the gene body regions of many genes. Many of the 
DMRs in the body regions may correspond to cryptic TSS 
sites or enhancer regions, which need to be suppressed or 
activated for efficient gene transcription (25). We also 
observed demethylation of remote upstream regions in 
some up-regulated genes such as NH1R3. In fact, there 
is evidence that enhancer methylation in some genes is 
functionally critical for gene regulation (26,27). This 
may be understood in hne with a previous model that 
primary TFs prime their target gene promoters at earlier 
stage in association with cell fate decisions and the gene 
expression is 'triggered' with a time lag after accumulation 
of secondary reinforcing events (5,11). On the contrary, 
some genes had their promoter methylated at basehne and 
were activated almost simultaneously with promoter 
demethylation, suggesting an instant triggering role of 
these genes. We provided one example of this in the per- 
tinent role of demethylation at low-density CpG 
(non-Island) sites in the promoter regions having the 
strongest functional impact in triggering expression 
changes (Figure 3A) in the establishment of pre-B-cell 
lineage. We also found that these non-CGI promoters 
display a histone modification profile that is more 
similar to CGI promoters than to the average non-CGI 
promoters. 

Strikingly, only demethylation processes were noted 
from stages S2 to S4, although many genes were 
down-regulated during this period. Furthermore, there 
were only seven genes that were altered in expression 
and in methylation from stages S3 to S4 (Supplementary 
Table S3); these genes included a negative regulator of 
BCR signaHng, BTLA (which was increased in expression 
in S4) and the adenosine deaminase ADARBl, a gene 
distinct from the liypermutation and switch recombinase 
gene AICDA or activation-induced cytidine deaminase 
(Supplementary Table S3C). DNA methyltransferases 
including DNMTl, DNMT3A and DNMT3B were 
down-regulated as the stages progressed, supporting the 
diminished role of methylation in later stages. Therefore, 
it may be postulated that mechanisms other than methy- 
lation have the primary role in down-regulating genes in 
later stages, which may include changes in TF activity 
from protein modification, alterations in TF interactions 
and combinatory co-binding properties, changes in 
histone modification, effects of cytokines and miRNAs, 
and altered RNA decay speeds from RNA interference 
(28-32). 

We noted that the de novo and demethylation processes 
occurred in regions enriched for specific TF binding sites, 
and these were identified in silico data using motif finding 



algorithms (Supplementary Table SI) as well as in in vivo 
data using ChlP-Seq data from the ENCODE project 
(Table 1). From SI to S2, EBF binding motifs were 
most significantly demethylated corresponding to its 
central role in B-cell development (3,4). This was 
followed by Ets, RUNXl, TCF3 and ELF5 motifs 
{in silico) and E2F family members and PAX5 {in vivo). 
The Ets family TFs consists of over 30 members sharing 
common DNA binding (ETS) domain and have significant 
but heterogeneous roles on hematopoiesis (33). 
Accordingly, de «ovo-methylated regions were also 
enriched for Ets family TFs. SPIl (PU.l), an Ets family 
TF involved in the myeloid versus B-lymphoid hneage 
decision, were also enriched both in hyper- and 
hypomethylated regions in vivo suggesting its complex 
roles (3,5). The TCF3 (E2A) is putatively known to 
activate the B-cell hneage-specific gene program synergis- 
tically with EBFl and its motifs were significantly 
enriched here (Table 1) (3,4,34). Gene famiUes associated 
with hypermethylation include targets of the myeloid TFs 
CEBPB and P300 (Table 1), a marker of lineage commit- 
ment. The SPIB motif was also enriched in 
hypermethylated regions; the molecule is specific to 
plasmacytoid dendritic cells and HSCs (35) suggesting 
that its repression is associated with binding site 
hypermethylation. 

Using expression analysis, we found that many 
pathway-related genes were up-regulated from SI to S2 
(Supplementary Table S2). As expected, antigen present- 
ing, BCR and B-cell developmental pathways were most 
significantly enriched. The PI3K pathway was suggested 
to be crucial for BCR/pre-BCR signaling (36), and protein 
ubiquitination was recently highlighted in NFkB signaHng 
(37). The HGF /i-chain was found to form a pre-pro-B-cell 
growth-stimulating factor with interleukin-7 (38), but may 
need further exploration. The role of NEAT family, HGF, 
IL-3, protein kinase A and glucocorticoid and estrogen 
receptor pathways were somewhat redundant although 
some were previously implicated in B-cell development 
(39-44). From S3 to S4, a number of genes related to 
cell division and mismatch repair were down-regulated 
with BRCAl -related pathway being most significant. 
The ATM, CHK, PLK and cycHn pathways are also 
related to cell cycle control and DNA repair. This may 
be due to the reduced need for cell division but may also 
be related to preparations for 'hypermutation' in mature 
and activated B-cells. Interestingly, a study using Brca—j— 
cell line found increased somatic hypermutation in the 
imniunoglobuHn genes (45). 

Recently a global DNA demethylation signature was 
discovered in mouse erythropoiesis (46). This 
demethylation is associated with DNA repUcation and 
occurs at all DNA elements. This process is in contrast 
with the demethylation signature that Calvanese et al. (11) 
observe in non-erythropoietic cells, which are specific and 
targeted demethylation events with functional relevance, 
e.g. Figure 2F. In contrast to erythrocytes, a functional 
and specific DNA methylation pattern is critical for 
differentiated cell function in B-cells which remain 
nucleated and many of which are long-hved. 
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In summary, we have investigated DNA methylation 
changes in early human B-cell development in association 
with expression changes. DNA methylation changes were 
associated with profound effects on gene expression 
during early lineage commitment, especially DNA methy- 
lation changes in regions other than promoters. The 
changes were non-randomly located in terms of CGIs, 
alternative TSSs and TF binding sites. The impact of 
DNA methylation on gene regulation was reduced in 
later stages of B-cell development, suggesting that mech- 
anisms other than DNA methylation may have a principal 
role after lineage commitment. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Tables 1-5 and Supplementary Figures 
1-7. 
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